-
Notifications
You must be signed in to change notification settings - Fork 14.5k
[X86] Align f128 and i128 to 16 bytes when passing on x86-32 #138092
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
I'm not sure why this seems to have no effect; the tests at llvm/test/CodeGen/X86/fp128-abi.ll should fail since they need to be updated, but there is no change in codegen. I wonder if this is getting fixed up somewhere. |
fp128 is not a legal type, so this won't reach here (this is a super frustrating detail of how calling convention handling is implemented, it would be easier if the backend directly consumed the original type) |
Should it be made a legal type? I’m not sure what the exact definition of that is, but it is in the i386 ABI as of recent versions to be returned and (presumably, since I haven’t seen otherwise) passed in memory. |
No. There are no f128 registers or operations |
How does stack alignment get handled for non-legal types, or where does this need to get fixed up? I’m surprised passing doesn’t use the type’s natural alignment. |
The type is broken up into legal pieces according to getRegisterTypeForCallingConv + getNumRegistersForCallingConv, and those pieces are processed by the call lowering. This code is entirely disconnected from the decision of where to pass the value, in a register or at some stack offset |
a3b1041
to
3b2a385
Compare
✅ With the latest revision this PR passed the C/C++ code formatter. |
3b2a385
to
ee3b81f
Compare
@llvm/pr-subscribers-backend-x86 Author: Trevor Gross (tgross35) ChangesThe i386 psABI specifies that i386 does not specify an Fixes: #77401 Patch is 551.12 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/138092.diff 52 Files Affected:
diff --git a/llvm/docs/ReleaseNotes.md b/llvm/docs/ReleaseNotes.md
index daf822388a2ff..e91460d3a551c 100644
--- a/llvm/docs/ReleaseNotes.md
+++ b/llvm/docs/ReleaseNotes.md
@@ -228,6 +228,8 @@ Changes to the X86 Backend
--------------------------
* `fp128` will now use `*f128` libcalls on 32-bit GNU targets as well.
+* On x86-32, `fp128` and `i128` are now passed with the expected 16-byte stack
+ alignment.
Changes to the OCaml bindings
-----------------------------
diff --git a/llvm/lib/Target/X86/X86CallingConv.cpp b/llvm/lib/Target/X86/X86CallingConv.cpp
index 0b4c63f7a81f7..eb39259f7166b 100644
--- a/llvm/lib/Target/X86/X86CallingConv.cpp
+++ b/llvm/lib/Target/X86/X86CallingConv.cpp
@@ -374,5 +374,37 @@ static bool CC_X86_64_I128(unsigned &ValNo, MVT &ValVT, MVT &LocVT,
return true;
}
+/// Special handling for i128 and fp128: on x86-32, i128 and fp128 get legalized
+/// as four i32s, but fp128 must be passed on the stack with 16-byte alignment.
+/// Technically only fp128 has a specified ABI, but it makes sense to handle
+/// i128 the same until we hear differently.
+static bool CC_X86_32_I128_FP128(unsigned &ValNo, MVT &ValVT, MVT &LocVT,
+ CCValAssign::LocInfo &LocInfo,
+ ISD::ArgFlagsTy &ArgFlags, CCState &State) {
+ assert(ValVT == MVT::i32 && "Should have i32 parts");
+ SmallVectorImpl<CCValAssign> &PendingMembers = State.getPendingLocs();
+ PendingMembers.push_back(
+ CCValAssign::getPending(ValNo, ValVT, LocVT, LocInfo));
+
+ if (!ArgFlags.isInConsecutiveRegsLast())
+ return true;
+
+ unsigned NumRegs = PendingMembers.size();
+ assert(NumRegs == 4 && "Should have two parts");
+
+ int64_t Offset = State.AllocateStack(16, Align(16));
+ PendingMembers[0].convertToMem(Offset);
+ PendingMembers[1].convertToMem(Offset + 4);
+ PendingMembers[2].convertToMem(Offset + 8);
+ PendingMembers[3].convertToMem(Offset + 12);
+
+ State.addLoc(PendingMembers[0]);
+ State.addLoc(PendingMembers[1]);
+ State.addLoc(PendingMembers[2]);
+ State.addLoc(PendingMembers[3]);
+ PendingMembers.clear();
+ return true;
+}
+
// Provides entry points of CC_X86 and RetCC_X86.
#include "X86GenCallingConv.inc"
diff --git a/llvm/lib/Target/X86/X86CallingConv.td b/llvm/lib/Target/X86/X86CallingConv.td
index 823e0caa02262..f020e0b55141c 100644
--- a/llvm/lib/Target/X86/X86CallingConv.td
+++ b/llvm/lib/Target/X86/X86CallingConv.td
@@ -859,6 +859,11 @@ def CC_X86_32_C : CallingConv<[
// The 'nest' parameter, if any, is passed in ECX.
CCIfNest<CCAssignToReg<[ECX]>>,
+ // i128 and fp128 need to be passed on the stack with a higher alignment than
+ // their legal types. Handle this with a custom function.
+ CCIfType<[i32],
+ CCIfConsecutiveRegs<CCCustom<"CC_X86_32_I128_FP128">>>,
+
// On swifttailcc pass swiftself in ECX.
CCIfCC<"CallingConv::SwiftTail",
CCIfSwiftSelf<CCIfType<[i32], CCAssignToReg<[ECX]>>>>,
diff --git a/llvm/lib/Target/X86/X86ISelLoweringCall.cpp b/llvm/lib/Target/X86/X86ISelLoweringCall.cpp
index 9ad355311527b..b4639ac2577e8 100644
--- a/llvm/lib/Target/X86/X86ISelLoweringCall.cpp
+++ b/llvm/lib/Target/X86/X86ISelLoweringCall.cpp
@@ -237,9 +237,18 @@ EVT X86TargetLowering::getSetCCResultType(const DataLayout &DL,
bool X86TargetLowering::functionArgumentNeedsConsecutiveRegisters(
Type *Ty, CallingConv::ID CallConv, bool isVarArg,
const DataLayout &DL) const {
- // i128 split into i64 needs to be allocated to two consecutive registers,
- // or spilled to the stack as a whole.
- return Ty->isIntegerTy(128);
+ // On x86-64 i128 is split into two i64s and needs to be allocated to two
+ // consecutive registers, or spilled to the stack as a whole. On x86-32 i128
+ // is split to four i32s and never actually passed in registers, but we use
+ // the consecutive register mark to match it in TableGen.
+ if (Ty->isIntegerTy(128))
+ return true;
+
+ // On x86-32, fp128 acts the same as i128.
+ if (Subtarget.is32Bit() && Ty->isFP128Ty())
+ return true;
+
+ return false;
}
/// Helper for getByValTypeAlignment to determine
diff --git a/llvm/test/CodeGen/X86/abds-neg.ll b/llvm/test/CodeGen/X86/abds-neg.ll
index f6d66ab47ce05..2911edfbfd409 100644
--- a/llvm/test/CodeGen/X86/abds-neg.ll
+++ b/llvm/test/CodeGen/X86/abds-neg.ll
@@ -367,44 +367,49 @@ define i128 @abd_ext_i128(i128 %a, i128 %b) nounwind {
; X86-LABEL: abd_ext_i128:
; X86: # %bb.0:
; X86-NEXT: pushl %ebp
+; X86-NEXT: movl %esp, %ebp
; X86-NEXT: pushl %ebx
; X86-NEXT: pushl %edi
; X86-NEXT: pushl %esi
-; X86-NEXT: pushl %eax
-; X86-NEXT: movl {{[0-9]+}}(%esp), %edi
-; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
-; X86-NEXT: movl {{[0-9]+}}(%esp), %edx
-; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
-; X86-NEXT: movl {{[0-9]+}}(%esp), %ebp
-; X86-NEXT: subl %ecx, %eax
-; X86-NEXT: movl %eax, (%esp) # 4-byte Spill
-; X86-NEXT: sbbl %edx, %ebp
-; X86-NEXT: movl {{[0-9]+}}(%esp), %ebx
-; X86-NEXT: sbbl %edi, %ebx
-; X86-NEXT: movl {{[0-9]+}}(%esp), %esi
-; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
-; X86-NEXT: sbbl %esi, %eax
-; X86-NEXT: subl {{[0-9]+}}(%esp), %ecx
-; X86-NEXT: sbbl {{[0-9]+}}(%esp), %edx
-; X86-NEXT: sbbl {{[0-9]+}}(%esp), %edi
-; X86-NEXT: sbbl {{[0-9]+}}(%esp), %esi
-; X86-NEXT: cmovll %eax, %esi
-; X86-NEXT: cmovll %ebx, %edi
-; X86-NEXT: cmovll %ebp, %edx
-; X86-NEXT: cmovll (%esp), %ecx # 4-byte Folded Reload
-; X86-NEXT: xorl %ebx, %ebx
+; X86-NEXT: andl $-16, %esp
+; X86-NEXT: subl $16, %esp
+; X86-NEXT: movl 40(%ebp), %ecx
+; X86-NEXT: movl 44(%ebp), %eax
+; X86-NEXT: movl 24(%ebp), %edx
+; X86-NEXT: movl 28(%ebp), %esi
+; X86-NEXT: subl %ecx, %edx
+; X86-NEXT: movl %edx, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
+; X86-NEXT: movl %esi, %edx
+; X86-NEXT: sbbl %eax, %edx
+; X86-NEXT: movl %edx, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
+; X86-NEXT: movl 48(%ebp), %edx
+; X86-NEXT: movl 32(%ebp), %ebx
+; X86-NEXT: sbbl %edx, %ebx
+; X86-NEXT: movl 52(%ebp), %esi
+; X86-NEXT: movl 36(%ebp), %edi
+; X86-NEXT: sbbl %esi, %edi
+; X86-NEXT: subl 24(%ebp), %ecx
+; X86-NEXT: sbbl 28(%ebp), %eax
+; X86-NEXT: sbbl 32(%ebp), %edx
+; X86-NEXT: sbbl 36(%ebp), %esi
+; X86-NEXT: cmovll %edi, %esi
+; X86-NEXT: cmovll %ebx, %edx
+; X86-NEXT: cmovll {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Folded Reload
+; X86-NEXT: cmovll {{[-0-9]+}}(%e{{[sb]}}p), %ecx # 4-byte Folded Reload
+; X86-NEXT: xorl %edi, %edi
; X86-NEXT: negl %ecx
-; X86-NEXT: movl $0, %ebp
-; X86-NEXT: sbbl %edx, %ebp
-; X86-NEXT: movl $0, %edx
-; X86-NEXT: sbbl %edi, %edx
-; X86-NEXT: sbbl %esi, %ebx
-; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
-; X86-NEXT: movl %ecx, (%eax)
-; X86-NEXT: movl %ebp, 4(%eax)
-; X86-NEXT: movl %edx, 8(%eax)
-; X86-NEXT: movl %ebx, 12(%eax)
-; X86-NEXT: addl $4, %esp
+; X86-NEXT: movl $0, %ebx
+; X86-NEXT: sbbl %eax, %ebx
+; X86-NEXT: movl $0, %eax
+; X86-NEXT: sbbl %edx, %eax
+; X86-NEXT: sbbl %esi, %edi
+; X86-NEXT: movl 8(%ebp), %edx
+; X86-NEXT: movl %ecx, (%edx)
+; X86-NEXT: movl %ebx, 4(%edx)
+; X86-NEXT: movl %eax, 8(%edx)
+; X86-NEXT: movl %edi, 12(%edx)
+; X86-NEXT: movl %edx, %eax
+; X86-NEXT: leal -12(%ebp), %esp
; X86-NEXT: popl %esi
; X86-NEXT: popl %edi
; X86-NEXT: popl %ebx
@@ -438,44 +443,49 @@ define i128 @abd_ext_i128_undef(i128 %a, i128 %b) nounwind {
; X86-LABEL: abd_ext_i128_undef:
; X86: # %bb.0:
; X86-NEXT: pushl %ebp
+; X86-NEXT: movl %esp, %ebp
; X86-NEXT: pushl %ebx
; X86-NEXT: pushl %edi
; X86-NEXT: pushl %esi
-; X86-NEXT: pushl %eax
-; X86-NEXT: movl {{[0-9]+}}(%esp), %edi
-; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
-; X86-NEXT: movl {{[0-9]+}}(%esp), %edx
-; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
-; X86-NEXT: movl {{[0-9]+}}(%esp), %ebp
-; X86-NEXT: subl %ecx, %eax
-; X86-NEXT: movl %eax, (%esp) # 4-byte Spill
-; X86-NEXT: sbbl %edx, %ebp
-; X86-NEXT: movl {{[0-9]+}}(%esp), %ebx
-; X86-NEXT: sbbl %edi, %ebx
-; X86-NEXT: movl {{[0-9]+}}(%esp), %esi
-; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
-; X86-NEXT: sbbl %esi, %eax
-; X86-NEXT: subl {{[0-9]+}}(%esp), %ecx
-; X86-NEXT: sbbl {{[0-9]+}}(%esp), %edx
-; X86-NEXT: sbbl {{[0-9]+}}(%esp), %edi
-; X86-NEXT: sbbl {{[0-9]+}}(%esp), %esi
-; X86-NEXT: cmovll %eax, %esi
-; X86-NEXT: cmovll %ebx, %edi
-; X86-NEXT: cmovll %ebp, %edx
-; X86-NEXT: cmovll (%esp), %ecx # 4-byte Folded Reload
-; X86-NEXT: xorl %ebx, %ebx
+; X86-NEXT: andl $-16, %esp
+; X86-NEXT: subl $16, %esp
+; X86-NEXT: movl 40(%ebp), %ecx
+; X86-NEXT: movl 44(%ebp), %eax
+; X86-NEXT: movl 24(%ebp), %edx
+; X86-NEXT: movl 28(%ebp), %esi
+; X86-NEXT: subl %ecx, %edx
+; X86-NEXT: movl %edx, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
+; X86-NEXT: movl %esi, %edx
+; X86-NEXT: sbbl %eax, %edx
+; X86-NEXT: movl %edx, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
+; X86-NEXT: movl 48(%ebp), %edx
+; X86-NEXT: movl 32(%ebp), %ebx
+; X86-NEXT: sbbl %edx, %ebx
+; X86-NEXT: movl 52(%ebp), %esi
+; X86-NEXT: movl 36(%ebp), %edi
+; X86-NEXT: sbbl %esi, %edi
+; X86-NEXT: subl 24(%ebp), %ecx
+; X86-NEXT: sbbl 28(%ebp), %eax
+; X86-NEXT: sbbl 32(%ebp), %edx
+; X86-NEXT: sbbl 36(%ebp), %esi
+; X86-NEXT: cmovll %edi, %esi
+; X86-NEXT: cmovll %ebx, %edx
+; X86-NEXT: cmovll {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Folded Reload
+; X86-NEXT: cmovll {{[-0-9]+}}(%e{{[sb]}}p), %ecx # 4-byte Folded Reload
+; X86-NEXT: xorl %edi, %edi
; X86-NEXT: negl %ecx
-; X86-NEXT: movl $0, %ebp
-; X86-NEXT: sbbl %edx, %ebp
-; X86-NEXT: movl $0, %edx
-; X86-NEXT: sbbl %edi, %edx
-; X86-NEXT: sbbl %esi, %ebx
-; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
-; X86-NEXT: movl %ecx, (%eax)
-; X86-NEXT: movl %ebp, 4(%eax)
-; X86-NEXT: movl %edx, 8(%eax)
-; X86-NEXT: movl %ebx, 12(%eax)
-; X86-NEXT: addl $4, %esp
+; X86-NEXT: movl $0, %ebx
+; X86-NEXT: sbbl %eax, %ebx
+; X86-NEXT: movl $0, %eax
+; X86-NEXT: sbbl %edx, %eax
+; X86-NEXT: sbbl %esi, %edi
+; X86-NEXT: movl 8(%ebp), %edx
+; X86-NEXT: movl %ecx, (%edx)
+; X86-NEXT: movl %ebx, 4(%edx)
+; X86-NEXT: movl %eax, 8(%edx)
+; X86-NEXT: movl %edi, 12(%edx)
+; X86-NEXT: movl %edx, %eax
+; X86-NEXT: leal -12(%ebp), %esp
; X86-NEXT: popl %esi
; X86-NEXT: popl %edi
; X86-NEXT: popl %ebx
@@ -639,55 +649,59 @@ define i128 @abd_minmax_i128(i128 %a, i128 %b) nounwind {
; X86-LABEL: abd_minmax_i128:
; X86: # %bb.0:
; X86-NEXT: pushl %ebp
+; X86-NEXT: movl %esp, %ebp
; X86-NEXT: pushl %ebx
; X86-NEXT: pushl %edi
; X86-NEXT: pushl %esi
-; X86-NEXT: pushl %eax
-; X86-NEXT: movl {{[0-9]+}}(%esp), %ebp
-; X86-NEXT: movl {{[0-9]+}}(%esp), %edx
-; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
-; X86-NEXT: movl {{[0-9]+}}(%esp), %ebx
-; X86-NEXT: movl {{[0-9]+}}(%esp), %esi
-; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
-; X86-NEXT: cmpl %eax, %esi
-; X86-NEXT: sbbl %ebx, %ecx
-; X86-NEXT: movl %edx, %ecx
-; X86-NEXT: sbbl %ebp, %ecx
-; X86-NEXT: movl {{[0-9]+}}(%esp), %edi
-; X86-NEXT: movl {{[0-9]+}}(%esp), %edx
-; X86-NEXT: movl %edx, %ecx
-; X86-NEXT: sbbl %edi, %ecx
-; X86-NEXT: movl %edi, %ecx
-; X86-NEXT: cmovll %edx, %ecx
-; X86-NEXT: movl %ecx, (%esp) # 4-byte Spill
-; X86-NEXT: cmovll {{[0-9]+}}(%esp), %ebp
-; X86-NEXT: movl %ebx, %ecx
-; X86-NEXT: cmovll {{[0-9]+}}(%esp), %ecx
-; X86-NEXT: movl %eax, %edx
-; X86-NEXT: cmovll %esi, %edx
-; X86-NEXT: cmpl %esi, %eax
-; X86-NEXT: movl %ebx, %esi
-; X86-NEXT: sbbl {{[0-9]+}}(%esp), %esi
-; X86-NEXT: movl {{[0-9]+}}(%esp), %esi
-; X86-NEXT: sbbl {{[0-9]+}}(%esp), %esi
-; X86-NEXT: movl %edi, %esi
-; X86-NEXT: sbbl {{[0-9]+}}(%esp), %esi
-; X86-NEXT: cmovll {{[0-9]+}}(%esp), %edi
-; X86-NEXT: movl {{[0-9]+}}(%esp), %esi
-; X86-NEXT: cmovll {{[0-9]+}}(%esp), %esi
-; X86-NEXT: cmovll {{[0-9]+}}(%esp), %ebx
-; X86-NEXT: cmovll {{[0-9]+}}(%esp), %eax
-; X86-NEXT: subl %eax, %edx
-; X86-NEXT: sbbl %ebx, %ecx
-; X86-NEXT: sbbl %esi, %ebp
-; X86-NEXT: movl (%esp), %esi # 4-byte Reload
-; X86-NEXT: sbbl %edi, %esi
-; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
-; X86-NEXT: movl %edx, (%eax)
-; X86-NEXT: movl %ecx, 4(%eax)
-; X86-NEXT: movl %ebp, 8(%eax)
-; X86-NEXT: movl %esi, 12(%eax)
-; X86-NEXT: addl $4, %esp
+; X86-NEXT: andl $-16, %esp
+; X86-NEXT: subl $16, %esp
+; X86-NEXT: movl 40(%ebp), %esi
+; X86-NEXT: movl 24(%ebp), %edi
+; X86-NEXT: movl 28(%ebp), %eax
+; X86-NEXT: cmpl %esi, %edi
+; X86-NEXT: sbbl 44(%ebp), %eax
+; X86-NEXT: movl 48(%ebp), %edx
+; X86-NEXT: movl 32(%ebp), %eax
+; X86-NEXT: sbbl %edx, %eax
+; X86-NEXT: movl 52(%ebp), %ebx
+; X86-NEXT: movl 36(%ebp), %ecx
+; X86-NEXT: movl %ecx, %eax
+; X86-NEXT: sbbl %ebx, %eax
+; X86-NEXT: movl %ebx, %eax
+; X86-NEXT: cmovll %ecx, %eax
+; X86-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
+; X86-NEXT: movl %edx, %eax
+; X86-NEXT: cmovll 32(%ebp), %eax
+; X86-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
+; X86-NEXT: movl 44(%ebp), %eax
+; X86-NEXT: cmovll 28(%ebp), %eax
+; X86-NEXT: movl %esi, %ecx
+; X86-NEXT: cmovll %edi, %ecx
+; X86-NEXT: cmpl %edi, %esi
+; X86-NEXT: movl 44(%ebp), %edi
+; X86-NEXT: sbbl 28(%ebp), %edi
+; X86-NEXT: movl %edx, %edi
+; X86-NEXT: sbbl 32(%ebp), %edi
+; X86-NEXT: movl %ebx, %edi
+; X86-NEXT: sbbl 36(%ebp), %edi
+; X86-NEXT: cmovll 36(%ebp), %ebx
+; X86-NEXT: cmovll 32(%ebp), %edx
+; X86-NEXT: movl 44(%ebp), %edi
+; X86-NEXT: cmovll 28(%ebp), %edi
+; X86-NEXT: cmovll 24(%ebp), %esi
+; X86-NEXT: subl %esi, %ecx
+; X86-NEXT: sbbl %edi, %eax
+; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %edi # 4-byte Reload
+; X86-NEXT: sbbl %edx, %edi
+; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %esi # 4-byte Reload
+; X86-NEXT: sbbl %ebx, %esi
+; X86-NEXT: movl 8(%ebp), %edx
+; X86-NEXT: movl %ecx, (%edx)
+; X86-NEXT: movl %eax, 4(%edx)
+; X86-NEXT: movl %edi, 8(%edx)
+; X86-NEXT: movl %esi, 12(%edx)
+; X86-NEXT: movl %edx, %eax
+; X86-NEXT: leal -12(%ebp), %esp
; X86-NEXT: popl %esi
; X86-NEXT: popl %edi
; X86-NEXT: popl %ebx
@@ -848,37 +862,41 @@ define i128 @abd_cmp_i128(i128 %a, i128 %b) nounwind {
; X86-LABEL: abd_cmp_i128:
; X86: # %bb.0:
; X86-NEXT: pushl %ebp
+; X86-NEXT: movl %esp, %ebp
; X86-NEXT: pushl %ebx
; X86-NEXT: pushl %edi
; X86-NEXT: pushl %esi
-; X86-NEXT: pushl %eax
-; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
-; X86-NEXT: movl {{[0-9]+}}(%esp), %edx
-; X86-NEXT: movl {{[0-9]+}}(%esp), %esi
-; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
-; X86-NEXT: movl {{[0-9]+}}(%esp), %ebx
-; X86-NEXT: subl %edx, %eax
-; X86-NEXT: movl %eax, (%esp) # 4-byte Spill
-; X86-NEXT: sbbl %esi, %ebx
-; X86-NEXT: movl {{[0-9]+}}(%esp), %ebp
-; X86-NEXT: sbbl %ecx, %ebp
-; X86-NEXT: movl {{[0-9]+}}(%esp), %edi
-; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
-; X86-NEXT: sbbl %edi, %eax
-; X86-NEXT: subl {{[0-9]+}}(%esp), %edx
-; X86-NEXT: sbbl {{[0-9]+}}(%esp), %esi
-; X86-NEXT: sbbl {{[0-9]+}}(%esp), %ecx
-; X86-NEXT: sbbl {{[0-9]+}}(%esp), %edi
-; X86-NEXT: cmovgel (%esp), %edx # 4-byte Folded Reload
-; X86-NEXT: cmovgel %ebx, %esi
-; X86-NEXT: cmovgel %ebp, %ecx
-; X86-NEXT: cmovgel %eax, %edi
-; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
-; X86-NEXT: movl %edi, 12(%eax)
-; X86-NEXT: movl %ecx, 8(%eax)
-; X86-NEXT: movl %esi, 4(%eax)
-; X86-NEXT: movl %edx, (%eax)
-; X86-NEXT: addl $4, %esp
+; X86-NEXT: andl $-16, %esp
+; X86-NEXT: subl $16, %esp
+; X86-NEXT: movl 24(%ebp), %ecx
+; X86-NEXT: movl 28(%ebp), %edx
+; X86-NEXT: movl 40(%ebp), %eax
+; X86-NEXT: movl 44(%ebp), %esi
+; X86-NEXT: subl %ecx, %eax
+; X86-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
+; X86-NEXT: movl %esi, %eax
+; X86-NEXT: sbbl %edx, %eax
+; X86-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
+; X86-NEXT: movl 32(%ebp), %esi
+; X86-NEXT: movl 48(%ebp), %edi
+; X86-NEXT: sbbl %esi, %edi
+; X86-NEXT: movl 36(%ebp), %ebx
+; X86-NEXT: movl 52(%ebp), %eax
+; X86-NEXT: sbbl %ebx, %eax
+; X86-NEXT: subl 40(%ebp), %ecx
+; X86-NEXT: sbbl 44(%ebp), %edx
+; X86-NEXT: sbbl 48(%ebp), %esi
+; X86-NEXT: sbbl 52(%ebp), %ebx
+; X86-NEXT: cmovgel {{[-0-9]+}}(%e{{[sb]}}p), %ecx # 4-byte Folded Reload
+; X86-NEXT: cmovgel {{[-0-9]+}}(%e{{[sb]}}p), %edx # 4-byte Folded Reload
+; X86-NEXT: cmovgel %edi, %esi
+; X86-NEXT: cmovgel %eax, %ebx
+; X86-NEXT: movl 8(%ebp), %eax
+; X86-NEXT: movl %ebx, 12(%eax)
+; X86-NEXT: movl %esi, 8(%eax)
+; X86-NEXT: movl %edx, 4(%eax)
+; X86-NEXT: movl %ecx, (%eax)
+; X86-NEXT: leal -12(%ebp), %esp
; X86-NEXT: popl %esi
; X86-NEXT: popl %edi
; X86-NEXT: popl %ebx
@@ -1118,35 +1136,39 @@ define i128 @abd_subnsw_i128(i128 %a, i128 %b) nounwind {
; X86-LABEL: abd_subnsw_i128:
; X86: # %bb.0:
; X86-NEXT: pushl %ebp
+; X86-NEXT: movl %esp, %ebp
; X86-NEXT: pushl %ebx
; X86-NEXT: pushl %edi
; X86-NEXT: pushl %esi
-; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
-; X86-NEXT: movl {{[0-9]+}}(%esp), %edx
-; X86-NEXT: movl {{[0-9]+}}(%esp), %esi
-; X86-NEXT: movl {{[0-9]+}}(%esp), %edi
-; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
-; X86-NEXT: subl {{[0-9]+}}(%esp), %edi
-; X86-NEXT: sbbl {{[0-9]+}}(%esp), %esi
-; X86-NEXT: sbbl {{[0-9]+}}(%esp), %edx
-; X86-NEXT: sbbl {{[0-9]+}}(%esp), %ecx
-; X86-NEXT: movl %ecx, %ebx
-; X86-NEXT: sarl $31, %ebx
-; X86-NEXT: xorl %ebx, %ecx
-; X86-NEXT: xorl %ebx, %edx
-; X86-NEXT: xorl %ebx, %esi
-; X86-NEXT: xorl %ebx, %edi
-; X86-NEXT: movl %ebx, %ebp
-; X86-NEXT: subl %edi, %ebp
-; X86-NEXT: movl %ebx, %edi
-; X86-NEXT: sbbl %esi, %edi
-; X86-NEXT: movl %ebx, %esi
+; X86-NEXT: andl $-16, %esp
+; X86-NEXT: subl $16, %esp
+; X86-NEXT: movl 36(%ebp), %eax
+; X86-NEXT: movl 32(%ebp), %ecx
+; X86-NEXT: movl 28(%ebp), %edx
+; X86-NEXT: movl 24(%ebp), %esi
+; X86-NEXT: subl 40(%ebp), %esi
+; X86-NEXT: sbbl 44(%ebp), %edx
+; X86-NEXT: sbbl 48(%ebp), %ecx
+; X86-NEXT: sbbl 52(%ebp), %eax
+; X86-NEXT: movl %eax, %edi
+; X86-NEXT: sarl $31, %edi
+; X86-NEXT: xorl %edi, %eax
+; X86-NEXT: xorl %edi, %ecx
+; X86-NEXT: xorl %edi, %edx
+; X86-NEXT: xorl %edi, %esi
+; X86-NEXT: movl %edi, %ebx
+; X86-NEXT: subl %esi, %ebx
+; X86-NEXT: movl %edi, %esi
; X86-NEXT: sbbl %edx, %esi
-; X86-NEXT: sbbl %ecx, %ebx
-; X86-NEXT: movl %ebp, (%eax)
-; X86-NEXT: movl %edi, 4(%eax)
-; X86-NEXT: movl %esi, 8(%eax)
-; X86-NEXT: movl %ebx, 12(%eax)
+; X86-NEXT: movl %edi, %edx
+; X86-NEXT: sbbl %ecx, %edx
+; X86-NEXT: sbbl %eax, %edi
+; X86-NEXT: movl 8(%ebp), %eax
+; X86-NEXT: movl %ebx, (%eax)
+; X86-NEXT: movl %esi, 4(%eax)
+; X86-NEXT: movl %edx, 8(%eax)
+; X86-NEXT: movl %edi, 12(%eax)
+; X86-NEXT: leal -12(%ebp), %esp
; X86-NEXT: popl %esi
; X86-NEXT: popl %edi
; X86-NEXT: popl %ebx
@@ -1175,35 +1197,39 @@ define i128 @abd_subnsw_i128_undef(i128 %a, i128 %b) nounwind {
; X86-LABEL: abd_subnsw_i128_undef:
; X86: # %bb.0:
; X86-NEXT: pushl %ebp
+; X86-NEXT: movl %esp, %ebp
; X86-NEXT: pushl %ebx
; X86-NEXT: pushl %edi
; X86-NEXT: pushl %esi
-; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
-; X86-NEXT: movl {{[0-9]+}}(%esp), %edx
-; X86-NEXT: movl {{[0-9]+}}(%esp), %esi
-; X86-NEXT: movl {{[0-9]+}}(%esp), %edi
-; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
-; X86-NEXT: subl {{[0-9]+}}(%esp), %edi
-; X86-NEXT: sbbl {{[0-9]+}}(%esp), %esi
-; X86-NEXT: sbbl {{[0-9]+}}(%esp), %edx
-; X86-NEXT: sbbl {{[0-9]+}}(%esp), %ecx
-; X86-NEXT: movl %ecx, %ebx
-; X86-NEXT: sarl $31, %ebx
-; X86-NEXT: xorl %ebx, %ecx
-; X86-NEXT: xorl %ebx, %edx
-; X86-NEXT: xorl %ebx, %esi
-; X86-NEXT: xorl %ebx, %edi
-; X86-NEXT: movl %ebx, %ebp
-; X86-NEXT: subl %edi, %ebp
-; X86-NEXT: movl %ebx, %edi
-; X86-NEXT: sbbl %esi, %edi
-; X86-NEXT: m...
[truncated]
|
bool X86TargetLowering::functionArgumentNeedsConsecutiveRegisters( | ||
Type *Ty, CallingConv::ID CallConv, bool isVarArg, | ||
const DataLayout &DL) const { | ||
// i128 split into i64 needs to be allocated to two consecutive registers, | ||
// or spilled to the stack as a whole. | ||
return Ty->isIntegerTy(128); | ||
// On x86-64 i128 is split into two i64s and needs to be allocated to two | ||
// consecutive registers, or spilled to the stack as a whole. On x86-32 i128 | ||
// is split to four i32s and never actually passed in registers, but we use | ||
// the consecutive register mark to match it in TableGen. | ||
if (Ty->isIntegerTy(128)) | ||
return true; | ||
|
||
// On x86-32, fp128 acts the same as i128. | ||
if (Subtarget.is32Bit() && Ty->isFP128Ty()) | ||
return true; | ||
|
||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this is the right approach; functionArgumentNeedsConsecutiveRegisters
doesn't really seem like the correct thing because the type will never be passed in regesters, but I can't come up with another way to "mark" a register set to indicate that it needs custom lowering. Is there a better way to do this?
Cc @nikic since I think you rewrote the x86-64 custom lowering a couple of times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably also match vector types somehow because _m64
, __m128
, __m256
, and __m512
are specified to have an alignment of 8, 16, 32, and 64 bytes, respectively.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, unfortunately the CC lowering is pretty limited in the information it can access, and backends have a lot of different hacks to work around that. I'd like to improve that, but this looks like an acceptable hack for now.
I also brought up discussion about specifying an ABI for __int128 https://groups.google.com/u/1/g/ia32-abi/c/TuMAt7mwbIU |
9191404
to
3daf079
Compare
Rebased to pick up a relevant test change. Also put up a separate PR with the pretest so that can get its own commit #148753. |
The i386 psABI specifies that `__float128` has 16 byte alignment and must be passed on the stack; however, LLVM currently stores it in a stack slot that has an offset of 4. Add a custom lowering to correct this alignment to 16-byte. i386 does not specify an `__int128`, but it seems reasonable to keep the same behavior as `__float128` so this is changed as well. Fixes: llvm#77401
3daf079
to
ccdf8f7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM from my side.
Looks like this needs a rebase.
The i386 psABI specifies that
__float128
has 16 byte alignment andmust be passed on the stack; however, LLVM currently stores it in a
stack slot that has an offset of 4. Add a custom lowering to correct
this alignment to 16-byte.
i386 does not specify an
__int128
, but it seems reasonable to keep thesame behavior as
__float128
so this is changed as well. There alsoisn't a good way to distinguish whether a set of four registers came
from an integer or a float
Referenced ABI: https://gitlab.com/x86-psABIs/i386-ABI/-/wikis/uploads/14c05f1b1e156e0e46b61bfa7c1df1e2/intel386-psABI-2020-08-07.pdf
Fixes: #77401